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1. INTRODUCTION 

In the era of Internet and unlimited access of information, network security becomes one of the most 
important aspect to look into in order to keep confidential data and information from unauthorized third party 
access [1, 2]. Network Intrusion Detection System (NIDS) is an important field of research since it deals with 
many possibilities and aspects in the real-time application especially in terms of network security. It 
autonomously requires detection of any intrusion and send the gathered information to the authority [3-5]. 
A network intrusion is known as any action of breaking into the system illegally without the owner’s consent. 

Many of the applications in computer system nowadays are executed without full intervention or 
monitoring by in-charged personnel. This with the restricted computational and communication resources of 
the computer network increase the possibility of intrusions and unauthorized access into the network [6-9]. 
Computer networks should not rely solely on human action to avoid or overcome this illegal access of its 
system. Therefore, essential security system is needed to protect the confidential data and information in the 
networks [10, 11]. 

In this paper, intrusion detection technique based on metaheuristic approach known as Genetic 
Algorithm was developed in order to be applied in a computer network. The method will identify and 
calculate the differences between the behaviour of the unauthorized connection and normal connection using 
a proposed fitness (objective) function [12, 13]. The proposed technique was executed in two phases; training 
and testing. The dataset utilized in this study consists of a wide variety of intrusions connections simulated in 
military network environment and one of the most investigated dataset in this area [14-16]. 

D. U. S. Rajkumar and R. Vayanaperumal came out with the idea of deploying the Leader Based 
Intrusion Detection System (LBIDS) into access network in order to detect and prevent DOS such as Sybil 
and Sinkhole [16]. They used three core security challenges such as authentication, preventing DOS attack 
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and positive incentive provision by implementing the simulation in NS2 software. From the simulation 
results, the proposed approach proved its ability to fulfil the quality of service in the network. Meanwhile, a 
group from American University of Kuwait, has designed an efficient intrusion detection framework in 
WSNs, and recommended a new method that help in detecting and confining intrusive actions in the network. 
Based on research method proposed by M. Khanafer, et al, they proposed a new beacon-enabled 802.15.4 
MAC that is aimed to improve the performance in terms of power conservation, without undermining other 
important performance parameters [17]. As a result, Markov mathematical model was demonstrated and 
proved that the approach achieves its goals without affecting the other important performance metrics. The 
study about two common attacks that often occur in the Wireless Local Area Networks (WLAN) was 
conducted by J. Afzal, et al [18]. It can detect the attacks by applying the concept of Wireless Intrusion 
Detection System ((WIDS). In their study, they manage to obtain the detection accuracy of 89% and 93% for 
the two afore-mentioned attacks. It shows that the efficiency of the proposed attack signatures in WSN. 

The same method with a few additional improvements were proposed by M.S. Hoque, M. A. Mukit, 
and A. N. Bikas [19]. In their experiments, they included another type of connection that can be detected in 
IDS which are Remote to User Attacks (R2L) and User to Root Attacks (U2R). In 2013, a group of study 
from University of Mumbai has proposed the same approaches but with a little improvement. P. U. Kadam 
and P. P Jadhav has proposed an accuracy and effectiveness rule generation for different categories of 
abnormal connection detection [20]. At least 7 rules were created to identify each data and detect the attacks 
connection. As enhanced, they used Weka tool to remove the redundant data from KDD’99 Cup in order to 
improve the detection rate and system performances. Different from previous paper, this group also take 
another type of attack as main resources which are R21 and R2r. From the results, the detection rate for DoS 
attacks dominating the highest rate followed by Probe and normal connection with 97.80%, 81.25% and 
76.12% respectively. The detection rate for R21 and R2r still low which are 23% and 30.70% respectively. 
As whole results, the detection rate for some attack connection like DoS attacks remain higher than 90 % 
detection rate. However, there are some depreciation rate for Probe attacks and normal connection if 
compared to the previous paper. Non-dominated sorting Genetic Algorithm or NSGA-II is one type of GA 
that have multiples objective. The idea was proposed by A. Tamimi, D. S. Naidu and S. Kavianpour in 2015 
[21]. In this method, they consider features connection and generates the rules by using two different fitness 
function. The results were optimized by define the different objectives using NSGA-II. As the outcome, they 
able to fulfil their objective which are to use the effect of one feature on next generations without ignore it 
and calculate the sum of them to prevent the ignoring of features. 

The comparison between GA and Decision Tree (C4.5) Algorithm were proposed by S. Akbar et al 
in 2012 [22]. C4.5 algorithm was used to create a set of rule that can recognize and classify dissimilar pattern 
of assault links. In their research, they have create six rules to classify six type of attacks connections. These 
attacks fall into 4 categories known as DoS, root to local, U2R and probing attacks. The performance of two 
algorithm was studied by running the test separately to identify the performance between two methods. From 
the test experiment, it shows the results where the enhanced GA shows detection rate higher than enhanced 
C4.5. The FPR also biased to enhanced GA where it indicates the smallest value compared to enhanced C4.5 
algorithm. From the results, it can be concluded that the performance of GA is better than C4.5 algorithm. An 
optimized IDS using GA was proposed by S. Kumar and S. Dalal [23]. In their research, they have extend the 
rule generation set by integrating it with network sniffer to detect Denial of Service (DoS) attacks. With the 
use of KD’99 cup dataset, they separates the data into two parts; training and testing parts where GA was 
applied in the first parts. The testing data was also combined with the network sniffer and generated rule set. 
As outcome, it was capable to, stop the attacks by terminate its connection. In the final end, they were able to 
reach 97 % detection rate of intrusions estimated by this method. 

S. E. Benaicha et al from Algeria has proposed an IDS using GA with an improved selection 
operator and initial population [24]. The experiment was tested on using Network Security Laboratory 
Knowledge Discovery and Data Mining (NSL-KDD99) benchmark dataset.The system was implemented 
using Java language in NetBeans environment and data were stored using MySQL DBMS as database. The 
results from their experiment indicates that they reach 99.74 % detection rate and 3.74 % False Positive Rate 
(FPR). It can be concluded that the performance of the detection system is quite high and the FPR is still low. 

The study and analysis about improvise the multiclass classification accuracy for IDS was made by 
S. M. Gaffer, M. E. Yahia and K. Ragad [25]. They introduce Genetic Fuzzy System (GFS) method for IDS 
where it is the hybrid of fuzzy logic classifier and GA. Fuzzy association rule based classification method 
was used to gain a compact and accurate classifier with a low cost computational. From the results, they were 
successfully get detection rate and accuracy more than 90% for DoS and Probe attacks including normal 
connection while the rest about 73% and above. It is shows that the proposed approaches in their paper are 
very effective. 
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D. Narsingyani and O. Kale has proposed a GA in order to optimize false positive in IDS [26]. In 
their research, they have used the same rule generation of fitness function that has been proposed by S. E. 
Benaicha et al [9]. Different from them, this approach was used KDD’99 cup dataset to experiment the 
detection system. The types of attacks that were taken from as main categories attacks in the experimentation 
are Duration, Protocol, Service, flag, Source byte, Destination byte and Attack-Name. This proposed system 
was implemented using Java language which is built on third party software package JGAP or GA/GP java 
toolkit. From the results, they have successfully reduced FPR by increasing the number of rules in training 
data. 


2. RESEARCH METHOD 

As described earlier, intrusion can be considered as process of attack that can harm the computer 
network. The intruder can access the system to steal the stored information or gain the knowledge from 
someone else through their network. Therefore, GA is proposed to overcome this issue. There are several 
steps involved in GA implementation to detect network intrusion. 

Figure | shows the step-by-step of GA algorithm implementation for Network Intrusion Detection 
System in training process. The details method explained below: 





Step 1: Generate 100 chromosomes randomly. 
Step2: Attack recognition between generated chromosomes and 
training data. 

Step 3: Fitness function applied to measure fitness value. 

Step 4: Data sorted from highest to lowest fitness value. 

Step 5: Select top 10 fitness value. 

Step 6: Clones 5 times of 10 chromosomes. 

Step 7: Crossover between 2 parents of chromosomes. 

Step 8: Mutate one of the features in the chromosomes. 

Step 9: Calculate fitness value. 

Step 10: Data sorted from highest to lowest fitness value. 

Step 11: Select top 30 fitness value. 

Step 12: Take top 20 from fitness value, top 30 fitness value of 
crossover and 50 chromosomes by randomly generated 

Step 13: Repeat 30 times of attack recognition between 100 
population and training data. 

Step 14: Final population is obtained for testing process. 











Figure |. Training process using GA in IDS 


2.1. Separate data set into training and testing data set 

In KDD Cup 99 data set, there are 41 features that represent the variables used in a computer 
network [12]. The process of analyzing these all variables is time consuming and requires a large-scale 
computational steps. Due to this, this research focuses on eight most important features with 6 types of 
attacks. The dataset contains 284,948 connection data in which 10% of the data was selected as testing data 
by using pre-set probability value. In the beginning of this study, three different values were investigated for 
the selection process. The values are 0.2 (20%) 0.3 (30%) and 0.5 (50%). These probability values were used 
to indicate whether a connection data will be inserted into the pool of testing data. Once the pool is full 
(which is set to accommodate 10% of the total overall data), this selection process stops. The remaining 90% 
data were used during the training process with 256,454 of total connections. 

Two classes of attacks are the main focus: Denial of service (DOS) and probing attacks. For each 
field, maximum and minimum number were found out using specific code. The string type of value 
represented using number and sorted in ascending order. For example, for the ‘protocol’ features UDP, TCP 
and ICMP were changed to 10, 11, and 12 respectively. The proposed NIDS begins with training process. As 
for GA utilized in this study, 100 chromosomes were generated randomly based on the ranges of each fields 
for the initial population. 
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2.2. Fitness function 

The generated new chromosomes which represent potential solutions were exposed and compared 
with the training data. This was done to detect any pattern of attack stored in the database. All features were 
evaluated with the training data set to find the fitness value. The purpose of using fitness function is to select 
the best fit individuals that would undergo the next stage and create the next generation of chromosomes. In 
this paper, the proposed fitness function is given by the following formula [1]: 


F= 


wis 


a 
“ (1) 

Where ‘a’ represents the number of attacks that detected from comparison of set population and data 
set while ‘A’ represents the total number of attacks in dataset, “b’ is the number of normal connection that 
were detected out of total normal connections, B in the dataset. From that, the fitness value for each 
chromosome lies in region between -1 and 1. A positive value represents number of attacks correctly 
classified more than the normal connections. The chromosome is considered of good quality if the fitness 
value is close to 1. Before the selection, those chromosomes were sorted from the highest to the lowest 
fitness value. It is essential in order to select the good quality chromosomes. In this study, 10 fittest 
individuals were selected to undergo next stage which is crossover. 


2.3. Crossover and Mutation 
After 10 fit individuals were selected based on the fitness score, each chromosome was cloned 5 
times and this will produce 50 parents of chromosomes in total. An example of this process is in Figure 2. 





























Figure 2. Process of crossover 


Crossover occurs based on the predetermined crossover rate. This will determine how many features 
of parents will be inherited by both of the offspring as shown in Figure 2. It is necessary to make sure that the 
resulting offspring have maintain the range of allowable values in every fields. From the 50 cloned 
chromosomes, 25 were selected as the ‘parent 1’ and the remaining 25 as ‘parent 2’. After crossover takes 
place, 50 new chromosomes (offspring) were produced. These chromosomes were then mutated based on 
probability of mutation to slightly change the gene(s) of the new chromosomes. The mutation process was 
executed by using single mutation strategy. 


2.4. Generation of new chromosomes population 

After the 50 new chromosomes were mutated, the fitness value for each chromosome was calculated 
using the same fitness function used before. Then, the mutated chromosomes were sorted based on the 
highest to lowest fitness value. Thirty chromosomes with the best fitness value were selected as a part of the 
new population of the next iteration. Besides that, the new population will also comprise of 20 good-quality 
chromosomes taken from the initial population while the remaining 50 chromosomes were generated 
randomly. This new population will undergo the same steps and processes iteratively until the pre-set 
stopping criterion is exhausted. The final population of chromosomes produced will be used during the next 
step; testing process. 


2.5. Testing process 

Figure 3 above describes the testing process of intrusion detection system using GA. The final 
population of chromosomes obtained from the training process will be utilized during the testing phase. The 
process of recognition takes places between the final set of chromosomes and the pre-selected 10% 
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connection data from the original dataset. The success rate was calculated based on the number of attack 
connections that can be recognized. 





Step 1: Read testing data one-by-one. 

Step 2: Present the testing data to each of the chromosomes. 

Step 3: Compare all features of the testing data with the chromosomes. 

Step 4: If any of the chromosomes resembles the testing data, then 
attack is detected. Otherwise, the data is not an attack. 








Step 5: Calculate the true positive rate of the prediction. 





Figure 3. A testing process of IDS 


3. RESULTS AND ANALYSIS 

Based on the proposed method, the diversity of the raw data and how it is processed influence the 
final results. In the testing process, interactions were made between the testing dataset and set of 
chromosomes that were obtained during the training process. In the initial experiments which will divide the 
dataset into 2 parts (training and testing), investigations were performed to determine the best probability 
value. Three probability values were tested, which are 0.2, 0.3 and 0.5. 

Table 1 shows the results collected by using three different probability values. These values were 
used to select 10% of data from the raw dataset for testing phase. The fitness value which determines the 
success rate of detecting intrusions increases from iteration-to-iteration until it stops at a certain value. As 
shown in the table, the fitness value for the population decreases when the value of selection probability 
increases. 


Table 1. Fitness value for each 0.2, 0.3 and 0.5 probability of random selection 
Fitness value 





Herahon - _-~ probability 20.2) “Probability: 0.3 Probability: 0.5 
i 0.000153139 0.000188872 0.000204186 
» 0.000245023 0.000214395 0.195597 
3 0.000245023 0.00392547 0.195597 
4 0.000245023 0.166419 0.195597 
5 0.000245023 0.166419 0.195597 
6 0.00415008 0.166419 0.195597 
7 0.00415008 0.166419 0.195597 
8 0.00415008 0.166419 0.409459 
9 0.00415008 0.166419 0.409459 
10 0.00415008 0.166419 0.409459 
rT 0.00415008 0.166419 0.409459 
13 0.00415008 0.166419 0.409459 
1B 0.00415008 0.166419 0.409459 
14 0.00415008 0.248025 0.409459 
15 0.00415008 0.248025 0.409459 
16 0.460567 0.248025 0.409459 
17 0.460567 0.453441 0.409459 
18 0.460567 0.453441 0.409459 
19 0.460567 0.453441 0.409459 

20 0.460567 0.453441 0.409459 
21 0.460567 0.453441 0.409459 
22 0.460567 0.453441 0.409459 
23 0.460567 0.453441 0.409459 
24 0.460567 0.453441 0.409459 
25 0.460567 0.453441 0.409459 
26 0.460567 0.453441 0.409459 
27 0.460567 0.453441 0.409459 
28 0.460567 0.453441 0.409459 
29 0.460567 0.453441 0.409459 
30 0.460567 0.453441 0.409459 
31 0.460567 0.453441 0.409459 
32 0.460567 0.453441 0.409459 
33 0.460567 0.453441 0.409459 
34 0.460567 0.453441 0.409459 
35 0.460567 0.453441 0.409459 
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Table 2 shows the success detection rate and the average in 30 runs for the three different 
probabilities selection investigated in this study. The experiment was repeated for thirty times to find the 
average value. From the results obtained, it indicates that the average of the success rate and the probability 
value are directly proportionate as illustrated in Figure 4. 


Table 2. Success rate of intrusion detection based on each probability value and its average 
Probability of random data 





Success rate (%) Average Success rate 





selection (30 runs) 
0.2 92.6476 93.6754 
0.3 98.3119 93.764 
0.5 99.9825 99.8631 





Success rate is calculated based on the number of attacks that were recognised during the testing 
process. The probability value used affects the selection of connection data from the raw dataset. As a result, 
chromosomes that were produced during the training process might be similar with the most data that have 
been selected for that phase. Therefore, the higher the probability is, the more positive detection it can 
produce. 


Success rate for data recognition 
100 99.9825 
99,5 
99 


w 
wom o & 
So wm 


92.6476 98.3119 


94,5 I ; 
30% 


20% 


Percentages 


© 
oMyon 
nonann 


50% 


Probability of random selection 


Figure 4. Success rate for three different selection probabilities 


4. CONCLUSION 

In this study, the ideology of evolution in Genetic Algorithm was discussed and utilized to generate 
the desired solutions for network intrusion detection. Fitness value indicates the quality of a chromosome 
(candidate solution) that can detect a set of predetermined attack connection data during the training process. 
The proposed method uses the combination of genetic operators which are cloning, crossover and mutation 
processes to generate new chromosomes. The genetic processes were conducted in order to produce good 
quality chromosomes that have high fitness value towards the objective function. These good-quality 
chromosomes have high possibility/chance to recognize data connection in the network thus lead to intrusion 
detection. Based on the presented results, the proposed method has the capability to detect any intrusions 
connection in a network and proven to be a good mechanism to make computer networks more secure. 
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